feat(demos): manymove_industrial - BT manipulator + medkit gateway demo#59
Open
mfaferek93 wants to merge 22 commits into
Open
feat(demos): manymove_industrial - BT manipulator + medkit gateway demo#59mfaferek93 wants to merge 22 commits into
mfaferek93 wants to merge 22 commits into
Conversation
manymove BT pipeline + ros2_medkit fault reporting in a Docker compose
demo. v1 ships:
- Dockerfile pulling ros-jazzy-ros2-medkit-* debs and the selfpatch
manymove fork (feat/medkit-integration).
- SOVD manifest covering manymove BT client, move_group and the medkit
fault management stack.
- Three container scripts (arm-self-test, inject-collision,
restore-normal) that exercise the fault pipeline via
/fault_manager/report_fault.
- run-demo.sh / stop-demo.sh / check-demo.sh + a CI smoke test under
tests/smoke_test_manymove_industrial.sh.
OpenPLC + OPC UA bridge for the tier-2 PLC correlation narrative are
deferred to v1.5; see demos/manymove_industrial/README.md "TODO".
Mirrors the moveit_pick_place pattern: docker compose up the CI profile, run tests/smoke_test_manymove_industrial.sh, dump container logs on failure, tear down on always.
…docker style The image now reproduces manymove_bringup/docker (manymove + Groot + xarm_ros2 from source) and layers on the medkit fault_manager / gateway / Web UI plus our SOVD manifest and container scripts. The MANYMOVE_REPO build arg defaults to the selfpatch fork on feat/medkit-integration. demo.launch.py now includes the upstream xarm7_movegroup_fake_cpp_trees.launch.py verbatim, so the BT pipeline matches the project's own demos and the manymove-instrumented BT nodes emit MANYMOVE_* fault codes organically when the BT trips. Inject scripts moved from synthesising reports on /fault_manager/report_fault to flipping BT blackboard flags via the HMI update_blackboard service (real BT triggers); inject-soft-fault adds a thin collision wall to drive RETRY_ATTEMPT bursts through LocalFilter. SOVD manifest expanded with the real xArm7 FQNs (ufactory_driver, action_server_node, object_manager_node, hmi_service_node, move_group, bt_client_xarm7). Manymove HMI Qt + Groot ride along via X11 forwarding on the cpu profile.
…rvice path
Two fixes after running the rebuilt image locally:
- demo.launch.py: ros2_medkit_gateway needs namespace="diagnostics" so
medkit_params.yaml's "diagnostics:" section resolves and the gateway
binds 0.0.0.0:8080 instead of localhost-only. Without this, host
curl to /api/v1/health returned RST-on-recv even though the gateway
was alive inside the container.
- inject-*/restore-normal scripts: HMI service is exposed at
/update_blackboard, not /hmi_service_node/update_blackboard.
update_blackboard expects every value as a quoted string (the .srv
is string[] for value_data), so e.g. "true" not true. Also dropped
"set -u" which trips on ROS 2 setup.bash unbound vars.
The BT XML wires MoveManipulatorAction's collision_detected input port to the same blackboard key inject-collision flips, but the timing of when MoveManipulator's onStart reads the port relative to the BT loop means we sometimes observe the retry-exhausted path instead of the collision branch. Both prove the round-trip works. Bumped the post-inject sleep to 6 s so the BT has room to tick the retry cycle to completion before we poll the fault list.
The Web UI runs on :3000 and fetches the gateway on :8080; without CORS allow-origin the browser refuses cross-origin requests with "Failed to fetch". Mirror the cors block from moveit_pick_place's medkit_params.yaml: allow any origin, all standard methods, and the two headers the Web UI sends. Verified locally with `curl -I -H 'Origin: http://localhost:3000'` - gateway now returns Access-Control-Allow-Origin reflecting the origin.
…napshot capture Adds default_topics for freeze-frame snapshots and explicit rosbag topics / format / size limits so the fault-attached MCAP actually contains BT and tf data instead of an empty bag.
… shellcheck source directives CI gate hit two issues: 1. Smoke test POST /components/manymove-planning/scripts/<x>/executions returned 404 because the manifest only declared manymove-bt; the container_scripts directory name (manymove-planning) had no matching component. Added a manymove-planning component entry mirroring the moveit_pick_place demo pattern (moveit-planning component + same-named container_scripts dir). 2. shellcheck SC1091 on every 'source /opt/...setup.bash' line because those files do not exist on the host runner. Added 'shellcheck source=/dev/null' directives, matching the multi_ecu_aggregation convention.
…BT-motion dependency
The previous assertion ('MANYMOVE_PLANNER_* fault appears after
inject-collision') is brittle in CI: setting the BT blackboard
'collision_detected' flag only triggers a fault when
MoveManipulatorAction::onStart actually ticks. The CI fake-hardware launch
does not auto-issue motion goals, so the BT remains idle and the flag is
read by nothing.
Replace with:
- Loop over inject-collision + restore-normal endpoints to prove the
manifest <-> container_scripts component-id binding (the previous
manymove-planning 404 root cause).
- arm-self-test script execution + poll for MANYMOVE_SELFTEST fault: this
exercises the medkit REST -> FaultManager pipeline directly via
/fault_manager/report_fault, with no BT trajectory state dependency.
Real BT-emitted fault verification stays the responsibility of the
record_full.sh demo runs, which do start moves and observe the full
round-trip.
Adds a PLC simulator (asyncua-based OPC UA server) and an OPC UA -> medkit fault bridge so PLC AlarmConditionType events land in the same medkit FaultManager that aggregates manymove BT-side faults. Both faults appear in one dashboard with distinct source_ids, demonstrating cross-source correlation as the actual differentiator over single-source logging. The PLC sim exposes three canonical alarms (photoeye_flicker / WARN, conveyor_overspeed / ERROR, estop_engaged / CRITICAL) plus an admin HTTP endpoint so container_scripts and demo orchestrators can raise/clear alarms without speaking OPC UA themselves. Designed to be swappable with a real OpenPLC v3 + ST program once the IEC 61131-3 build pipeline is set up; the OPC UA surface (AlarmConditionType events on namespace 2) stays identical. The bridge is a ROS 2 Python node (rclpy + asyncua) that subscribes to AlarmConditionType events and calls /fault_manager/report_fault for each, with SourceName -> MANYMOVE_PLC_* fault code mapping. Loopback prevention drops events whose SourceName starts with our own source_id. Manifest gains a conveyor-line area, four PLC-side components (openplc, photoeye-pick, photoeye-drop, conveyor-motor), an opcua-bridge component, plus matching apps and a fault-aggregation function tying the bridge to the existing FaultManager + gateway. Smoke test now exercises the conveyor-line container_scripts (inject-photoeye-flicker, restore-line) and asserts MANYMOVE_PLC_* faults round-trip through the bridge into medkit.
pip uninstall fails on cryptography 41.0.7 because the package is managed by apt and has no RECORD file. --ignore-installed skips the uninstall step so asyncua's newer cryptography dep can land alongside.
…d OPC UA endpoint asyncua server advertises an endpoint URL the client reconnects to after the initial bind. With the default 0.0.0.0 bind, that advertised URL is not resolvable from other containers. Pin both the service hostname and the advertised OPC UA endpoint to 'plc-sim' so the bridge stays on the docker-compose service-name DNS path.
Setting container_name suppresses Docker compose's default service-name network alias on user-defined bridges, so plc-sim was no longer resolvable as 'plc-sim' from other containers. Add the alias back explicitly.
…bridge container_name on a user-defined bridge network suppresses the default service-name DNS alias. Even with an explicit aliases entry, the embedded resolver was not registering 'plc-sim' for some reason; CI kept getting 'Temporary failure in name resolution'. Removing container_name lets compose use the default service-name alias path, which is the well-trodden case.
Logs the resolved IP for the OPC UA endpoint hostname at startup so future 'Temporary failure in name resolution' loops are diagnosable without docker exec.
…node AlarmConditionType events require the source to be an Object node that supports the EventNotifier attribute. The plc_sim was passing Variable nodes as event sources (Photoeye/Conveyor/Estop tags), which made asyncua crash at startup with BadAttributeIdInvalid when set_event_notifier ran. Use the standard Server object as the emitter for all three alarms. SourceName in the event already disambiguates which alarm fired. Verified end-to-end locally: POST /alarm/photoeye_flicker/raise lands as MANYMOVE_PLC_PHOTOEYE_FLICKER CONFIRMED with source_id=/plc/sensor_io in the medkit dashboard.
…ax smoke poll filters The container_scripts/conveyor-line/ directory expects a manifest component with id 'conveyor-line' (mirroring the manymove-planning pattern earlier). Without it, gateway returned 404 on inject-photoeye-flicker / restore-line script executions. Also relax smoke test status filters: MANYMOVE_SELFTEST is severity 0 INFO so may not pass debounce to CONFIRMED in time; just check the fault appears in /faults at all. For PLC heal, accept PREPASSED in addition to HEALED since the healing threshold may not be crossed within the 30s window from a single PASSED event.
…lision Adding component 'conveyor-line' (for script binding) made the id collide with the area also called 'conveyor-line'. Manifest validation rejected the whole file, so the gateway came up with no apps / components and every smoke assertion failed (not just the new PLC ones). Area is now 'line'; component stays 'conveyor-line' so the container_scripts/conveyor-line/ directory still binds correctly via the gateway's component->scripts mapping.
severity=0 INFO doesn't pass the FaultManager debounce, and the FAILED -> PASSED pair clears the fault from active list within the smoke poll window anyway. Keep the script-accepted check; the real REST round-trip proof comes from the PLC bridge section right below it (severity 1 WARN, real source_id, real opcua_bridge forwarding).
…namespace Topic-discovery in the medkit gateway routes nodes into SOVD entity tree slots by namespace. Without any published topic, the opcua_bridge was invisible in the entity tree even though its faults appeared in the dashboard. Publishing a 1 Hz heartbeat on /plc/heartbeat anchors the bridge under the conveyor-line area so operators see it next to the PLC components.
…e' area opcua_bridge publishes /plc/heartbeat (under namespace /plc); without a matching area namespace, the gateway's topic discovery couldn't route the bridge into the SOVD entity tree. Anchor /plc under the 'line' area and move the opcua-bridge component there too. Drop the empty 'bridge' area.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
New demo
demos/manymove_industrial/: brings up amanymoveBT manipulator pipeline alongside aros2_medkitgateway, with a SOVD manifest that maps every BT executable to its component. Provides the runtime piece that consumes thefeat/medkit-integrationfork ofmanymove(PR selfpatch/manymove#1).What changes
New demo:
demos/manymove_industrial/xarm-sim(manymove BT + fake hardware on Domain ID 42),medkit-gateway(FaultManager + REST),medkit-web-ui. Network bridgemedkit-net.bt_client_*withros_binding.namespace+node_namematching the renamed unique node names in the fork).medkit_params.yaml: rosbag snapshot config with explicit topic list (/joint_states,/tf,/tf_static,/blackboard_status,/planning_scene), MCAP format, 50 MB per-bag cap, 500 MB total.run-demo.sh,stop-demo.sh,check-demo.sh,inject-soft-fault.sh,restore-normal.sh.Smoke test
tests/smoke_test_manymove_industrial.sh: brings up the stack, waits for gateway health, triggers a collision via the inject helper, polls/faults/activefor a manymove planner fault (accepts eitherCOLLISION_DETECTEDorRETRIES_EXHAUSTED).CI
build-and-test-manymove-industrialin.github/workflows/ci.yml, mirroring the existingbuild-and-test-moveitshape: builds the compose stack, runs the smoke test, uploads logs on failure.Bug fixes (during integration)
Dependencies
selfpatch/manymovebranchfeat/medkit-integration(Feat/medkit integration manymove#1) so the BT action nodes actually emit medkit faults. Without that PR merged, the demo runs but no faults are reported through the native instrumentation path.